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Abstract — In the economic literature, geographic distances are 
considered fundamental factors to be included in any theoretical 
model whose aim is the quantification of the trade between coun- 
tries. Quantitatively, distances enter into the so-called gravity 
models that successfully predict the weight of non-zero trade 
flows. However, it has been recently shown that gravity models 
fail to reproduce the binary topology of the World Trade Web. 
In this paper a different approach is presented: the formalism of 
exponential random graphs is used and the distances are treated 
as constraints, to be imposed on a previously chosen ensemble 
of graphs. Then, the information encoded in the geographical 
distances is used to explain the binary structure of the World 
Trade Web, by testing it on the degree-degree correlations and 
the reciprocity structure. This leads to the definition of a novel 
null model that combines spatial and non-spatial effects. The 
effectiveness of spatial constraints is compared to that of non- 
spatial ones by means of the Akaike Information Criterion and 
the Bayesian Information Criterion. Even if it is commonly 
believed that the World Trade Web is strongly dependent on 
the distances, what emerges from our analysis is that distances 
do not play a crucial role in shaping the World Trade Web binary 
structure and that the information encoded into the reciprocity 
is far more useful in explaining the observed patterns. 

I. Introduction 

We usually consider networks only from the topological 
point of view, with the adiacency matrix encoding all the 
necessary information about the connections between nodes. 
However, many networks are also embedded into a metric 
space and vertices have positions described by metric coor- 
dinates. In these networks, distances are naturally induced 
between nodes and geometric proximity represent a novel kind 
of connectedness to be defined for the vertices. An interesting 
goal becomes quantifying the influence that geometric dis- 
tances have on the purely topological connections. A clear 
example is provided by geographic distances. 

The role of distances in shaping the World Trade Web 
(WTW), i.e. the network of import-export trade relationships 
among all world countries, enters, in the economic literature, 
only into the definition of the class of models called gravity 
models 0], 0. The latter, mimicking the equation of the 
gravitational potential, predict an intensity of trade between 
two countries, i and j, which (in the simplest case) is directly 
proportional to their Gross Domestic Products (GDPs) and 
inversely proportional to their geographical distance. So the 
fundamental ingredients in the economic recipe are the GDPs 
of the involved countries and the geographic distances between 



them, disfavouring distant countries to establish intense trading 
relationships. 

Even if gravity models have been proved to be valid to pre- 
dict the weighted structure of the WTW [3|, three limitations 
of this approach consist 1) in the exclusively weighted nature 
of the predicted network, 2) in the trivial topological structure 
it induces, 3) in the lack of a reciprocity structure of trade- 
flows between the same nodes that cannot be predictable. In 
fact, gravity models cannot predict zero trade between any 
two countries (exactly as the gravity force between any two 
bodies cannot be zero), thus creating a trivially, fully connected 
World Trade Web. Moreover, by using only the aforementioned 
quantities, even if asymmetric flows can be induced (by means 
of additional parameters: usually exponents), a "reciprocal 
flow" cannot be defined between countries, thus failing to 
reproduce the strong observed reciprocity 0, Q, @, Q 
of trade-exchanges. Variations of the gravity models have 
been defined so far (the so-called zero-inflated gravity models 
0) to overcome the first two problems and to be able also 
to predict the existence of a link (and not only its weight, 
once its existence has been observed). However, the prediction 
thus obtained does not seem to be good at all [3], with the 
consequence that all the topological structure has be known in 
advance, to succesfully reproduce the observed weights. 

In this paper we overcome these limitations, by using a 
different approach: the exponential random graph formalism. 
In this theoretical framework, geographic distances are con- 
sidered as given, exactly as in the previous case, but are used 
to calculate the probability according to which any two nodes 
establish a directed connection (a trading relationship, seen 
as export or import): the only additional information comes 
from some kind of chosen topological constraint. Moreover, 
this framework allows us to compare the effectiveness of the 
distances in explaining the observed patterns with respect 
to other well known quantities, as the degree sequence and 
the reciprocity: what turns out is that distances do not add 
significantly more information to what already predicted by the 
degree sequence alone which, in turn, is known to be related 
to the GDPs of the world countries 0, 0. 

The results presented in what follows are about the WTW 
as considered in its binary, directed representation (BDN, in 
what follows) as obtained by the database in iflOl . Following 
ifTTI . our aim is to disentangle spatial and non-spatial effects 



in the real WTW. To this end, our approach is the comparison 
of the observed WTW with the prediction of various null 
models that control either for purely topological effects or 
for a combination of topological and spatial effects. Finally 
we introduce a way to quantitatively assess the significance 
of the information gained by adding geographic distances to 
non-spatial models of trade. 

II. Null models 

The method we use to introduce null models of the World 
Trade Web implements a recently proposed procedure J5), 
[12 1, developed inside the exponential random graph theoreti- 
cal famework [13], [14], [ 15]. The method is composed by two 
main steps: the first one is the maximization of the Shannon 
entropy over a previously chosen set of graphs, Q 

S = -^P(G)lnP(G) (1) 

Geg 

under a number of imposed constraints |[T2l . Ifl6l . 02), 
generically indicated as 



J2 P(G) = 1, E p (GMG) = 

Geg Geg 



V a 



(2) 



(note the generality of the formalism, above: G can be a 
directed, undirected, binary or weighted network). We can 
immediately choose the set Q as the grandcanonical ensemble 
of BDNs, i.e. the collection of networks with the same number 
of nodes of the observed one (say N) and a number of 
links, L, varying from zero to the maximum (i.e. N(N — 1)). 
This prescription leads to the exponential distribution over the 
previously chosen ensemble 



P(G\9) 



-H(G, 9) 

Z0) 



(3) 



whose coefficients are functions of the Hamiltonian, 
H(G, 9) = J2 a @a,Tr a (G), which is the linear combination of 
the chosen constraints. The normalization constant, Z(8) = 



E 



Geg ' 



-H(G, 8) 



, is the partition function (T2]|, IB), ifTTl . 



The second step prescribes how to numerically evaluate the 
unknown Lagrange multipliers. Given a real network, G* , Let 
us consider the log-likelihood function In £(9) — \nP(G*\9) 
and maximize it with respect to the unknown parameters (9), 
[12|. In other words, we have to find the value 8* of the 
multipliers satisfying the system 



din £(6) 



d9 a 



0, Va, 



or, that is the same, 



7r (C7*) = (7r a )(r) = (7r a )*, Va 



(4) 



(5) 



i.e. a list of equations imposing the value of the expected 
parameters to be equal to the observed one (9), lfl2ll - Note 
that the term "expected", here, refers to the weighted average 



taken on the grandcanonical ensemble, the weights being the 
probability coefficients defined above. 

So, once the unknown parameters have been found, it is pos- 
sibile to evaluate the expected value of any other topological 
quantity of interest, X: 



(xy = Y,x(G)P(G\e*). 

Geg 



(6) 



Because of the difficulty to analytically calculate the ex- 
pected value of the quantities commonly used in complex 
networks theory, it is often necessary to rest upon the linear 
approximation method: (X)* ~ X((G)*). 

This is a very general prescription, valid for binary, 
weighted, undirected or directed networks: since the WTW 
has been considered in its binary, directed representation, 
the generic adjacency matrix G will be indicated, from now 
on, with the usual letter A; so, (A)* indicates the expected 
adjacency matrix, whose elements are (djj)* =P*j- 

The next four subsections will be devoted to the explanation 
of the null models considered in the present analysis. 

A. Directed Configuration Model (DCM) 

The DCM is one of the most used null models in the 
complex networks literature [15]. The reason why we choose 
it as our baseline model is that the DCM has been shown 
to reproduce remarkably well several properties of the WTW, 
including degree correlations and clustering coefficients [ 1 8 1 . 
The DCM Hamiltonian is the following: 



A' 



N N 
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(7) 



the linearity of the constraints (the in-degree and out-degree 
sequences) in the adjacency matrix elements implies that the 
probability coefficient for the generic network, A, factorizes 
as a product over the directed pairs of nodes 

^i^nn^ 1 -^ (8) 

having defined pij = j^r 2 ^-, after having posed Xi = e~ ai , 
yi = e~P' . The likelihood function is 

N N 

In £ DCM - E In Xi + fef In Ife) H H^iVj) 

i=lj(^i)=i 



(9) 



and the maximum of the likelihood prescription becomes 
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Once the unknown variables are numerically determined, 
the expected value of any adjacency matrix entry becomes 



Pi 



(10) 



and can be used to calculate the expected value of any other 
topological quantity of interest. 

B. Directed Configuration Model with Distances (DDCM) 

This model is one of the two main novelties that we intro- 
duce in the present paper. The DDCM Hamiltonian consists 
of the DCM Hamiltonian with one constraint more: a global 
quantity taking into account the information carried by the 
geographic distances, i.e. 



JY 



N N 



H(Aj) = Y,(^k° ut + frkD + lJ2 aijdij = 

i=l i=l (=£i) = l 

N N 

= ^2 ^2 fa +^d ij )a ij . (11) 

The information about the geographical distances is con- 
densed in a global index, fixing the total sum of the con- 
nected vertices' distances. The reason why we introduce this 
model is that, according to recent results ifTTI . the spatial 
effects measured by the quantity Wtot 



Ei Ei(>i)( 



Ei Ej(/i) a ijdij — 

dji)dij, even if weak, are not reproduced 
by the DCM. The DDCM reproduces those effects by con- 
struction, and we wil later introduce a way to quantify the 
corresponding information gain. Note that the Hamiltonian is 
again linear in the adjacency matrix entries: so, the probabil- 
ity of a given configuration factorizes again as the product 
P(A\6) = 1I,H ; , ,, -Py) 1 °«, but with a different 

Pij coefficient: 



Pij 



1 



(12) 



(and where the Lagrange multipliers have been reabsorbed into 
the hidden variables defintion: Xi = e~ ai , = eT^\ Vi, z = 
e -7 ). The likelihood function is 



where every index runs from 1 to N. Once the previous system 
has been solved, we can use the p*- in (12i to evaluate the 
expected value of all the topological quantities of interest. Note 
that, by posing 7 = (z = 1), we recover the usual DCM. 

C. Reciprocal Configuration Model (RCM) 

The third null model we consider is the Reciprocal Con- 
figuration Model (RCM) lfT2l . This model was defined to 
take into account the topological information encoded into the 
reciprocity structure of the observed network J5), (5). It was 
shown that the RCM succeeds in reproducing almost perfectly 
all the triadic motifs of the WTW |7|, which are instead not 
reproduced by the DCM. The RCM will therefore compete 
with the DDCM in improving the fit to the real network. The 
Hamiltonian of the RCM is the following 



H(A, 



N 

£ 

i=l 



(15) 



and, unlike the previous two ones, it is not linear in the 
adjacency matrix entries. In fact, the imposed constraints are 
the three degree sequences, respectively defined as: the non- 
reciprocated out-degree sequence, where k~* = Ej(^i) a Jj = 
Ej(^i) a ij (1 — a ji)i the non-reciprocated in-degree sequence, 
where kf = E^i) a tj = Ej(^) M 1 ~ a ij)' the re " 
ciprocated degree sequence, where k** = Ej(/i) a tj = 
Ej(=4») a ij a ji- All the above sequences are defined in terms 
of non-linear combinations of the a^s 1121 . f5), @, 0. 
Nevertheless, the model is analitically solvable, the likelihood 
function is 



In Lrcm = fti* In Xi + fcf" In + In z l ) + 

i 

N N 

- ^2 £ + x iVj + x jVi + z i z j)( 16 ) 

and the maximization of the likelihood function leads to the 
following system to be solved: 



ln£ 



DDCM 
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(13) 



and the maximization of the likelihood function leads to the 
following system to be solved 



Uin _ fuinX* _ x jV*( z *) 13 w ■ 
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Vi, 

Vi, (17) 
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where, now, we have three different probability coefficients: 
(Pij)*> (Pij)*> (Pij)*> respectively the generic addendum of 
the three equations above. 

D. Global Reciprocity Model ( GRM) 

This model is a simplified version of the RCM [19], where 
the reciprocity structure of the network is condensed in a 
general quantity, i.e. the total number of reciprocated links: 

L** = Yji k T = E; Ei(^i) a ij a ji- 11 can be obtained from 
the RCM by posing 7; = on +ft + 6, The constraints become: 



k out = fcjn = V« and L° : 

likelihood function is 



lnz 2 



The 



InZ 



(18) 

i(^i)=i lni 1 • '•,//, • .'•,//, • .c,./-,//,//, :-;. 
and the maximization of the likelihood function leads to the 
following system to be solved: 



withlnZ = EtiE JV 
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x * x jy*yj( z *) 



'j&i) i+ x *y*+ x *y*+ x * x *y*y*(z*) 2 ' 

The GRM can be also considered as an enriched version 
of the DCM, to which the information about the global 
reciprocity has been added. 

III. Statistical validation 

We have a list of four null models to calculate the expected 
value of the topological quantities of interest. The second main 
novelty we introduce in this paper is the identification of a 
statistically correct procedure to compare these null models 
and to choose the most effective, among them. 

A. Likelihood ratio 

A first attempt could be that of comparing the likelihood 
values in their stationary points: the higher the value, the 
better the model to describe the considered network. A more 
quantitative way of testing the effectivness of two competing 
null models (say NMi and NMj) is the calculation of their 
likelihood ratio, simply defined as 



LR 



ln£ 



NMi /NMj 



NMi 



ln£ 



(20) 



NMi 



where the symbols 0j and 6j indicate the two different sets of 
Lagrange multipliers. However, the likelihood ratio test suffers 
from three severe limitations Ell . 

The first one lies in the fact that the null models have to 
be nested: the i-hypotesis has to be a special case of the j- 
hypotesis. Even if the DCM and the DDCM are nested and 
also the GRM and the DCM are nested as well, the RCM and 
the DDCM are not nested. 

The second reason lies in the number of parameters. In 
fact, as the number of parameters rises, the agreement between 
the model and the observed network increases, too. So, even 
considering nested models, we could arbitrarily improve the 
i-hypotesis by simply adding more and more constraints. The 
drawback of this procedure is the risk of overfitting. 

The third reason lies in the number of models that can be 
tested: only two alternative hypoteses can be compared. We 
could only compare the effectiveness of two models at a time, 
ignoring the others and not carrying out a global comparison 
to choose from the whole set of models. 



So, we need a criterion to choose among more than two 
competing null models, possibly not nested, and which dis- 
counts the number of parameters used to define them. 

B. Akaike Information Criterion (AIC) 

Indeed, the Akaike Information Criterion suits our needs 
for selecting among several models ED . 11221 . Il23l by simply 
prescribing to calculate the following quantity 



AIC 



NM* 



2K 



NAL 



21n£(6>* 



I NMi 



(21) 



i.e. the difference between (twice) the number of parameters 
of the null model i, NMi, and (twice) its log-likelihood, 
evaluated in its maximum for every considered null model. 
For the four considered cases, we have: Kdcm = 27V, 
Kddcm = 2N + 1, K GRM =2N + 1, K RCM = 3JV. Then, 
the recipe prescribes to choose the null model with the lowest 
AIC. 

C. Akaike weights 

AIC simply tell us which model is the best, among those 
considered in the set. However, to quantify the improvement 
in choosing the best model with respect to the others, the so 
called Akaike weights can also be computed, defined as 



w NMi 



Er=l e 



(22) 



where Ajva^ = AIC* NM . - mm{AIC*}^ =1 , being R the 
total number of considered null models. The models with 
substantial support should have A < 2, the models with less 
support should have 4 < A < 7 and models with A > 10 
have essentially no support [22 1, |23|. 

The Akaike weights can be interpreted as the probability 
that the considered model is, in fact, the best one. Confidence 
intervals can also be built, reducing the number of models 
which could be considered as valid candidates ll22l 



D. Bayesian Information Criterion (BIC) 

Exactly as for the AIC, another quantity can be calculated 
and used to define the weights of the considered models: the 
Bayesian Information Criterion. The only difference lies in 
the term to be discounted from the maximized likelihood: 



BIC 



NM. 



K NMi Inn - 2 In £(6»* 



'NMi 



(23) 



the first addendum accounts not only for the number of 
parameters, K, but also for the cardinality of the sample, n. In 
our case, n = N(N — 1), because we are considering directed 
matrices. The BIC weights are defined analogously: 



u NMi — 



(24) 



Er=l e 



where A NM 



BIC 



mm{BIC*}? =1 , being R the 
total number of considered null models. Criteria to interpret 



NM, 



the BIC weights similar to those stated above hold. It is 
commonly said the AIC favours the model with the highest 
number of parameters and that BIC, on the other hand, could 
be more restrictive, favouring one of the models with less 
parameters l22l . If23ll . Since the discussion is still topical, we 
have presented and compared both. 

IV. Degree-degree correlations, reciprocity, 

FILLING 

In order to integrate our analysis with previous results in the 
literature, we will also study the performance of the various 
models in reproducing some specific structural properties. 
We consider the in-degree correlations and the out-degree 
correlations defined as 



j^in/in _ Ej(^i) a ji^j ^out/out 



E 



J^in 1 '"t fc? u t 

the reciprocity r [26 1 and the filling f ifTTI : 
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where 
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with the pairs of distances ordered in decreasing order, d\ 

e?2 > d 3 — di > . . . > dff(N-i)> and 
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N(N-l) ■ ■ 



d 2 , di) 
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with the pairs of distances ordered in increasing order. The 
filling was recently introduced to measure the tendency of a 
network to fill the euclidean space where it is embedded IfTTI . 
This goal is accomplished by measuring how the geographic 
distances are distributed over the topological links. 

Different methods were chosen to compare the effectiveness 
of the four null models in explaining the three quantities above. 
For the degree-degree correlations, the scatter plots of the 

i j j j , j t in/in .7 out /out . 

observed and the expected k i 1 and k i ' are shown. 

For the reciprocity and the filling a different quantity was 
defined, to incorporate in a single index the observed and the 
expected values under the chosen null model {NM) [|6), IfTTI : 



Pnm = 



r - {r) NM 

1 - ( r ) NM ' 



>NM — 
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where, e.g. 
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(30) 



and pfj indicates that the choice of a particular null model 
NM is nothing more than the choice of the corresponding 
probability coefficients for the adjacency matrix entries. Note 
also that both p and are normalized between 1 and —1. 






Fig. 1. Observed (red) and expected (colored) fcj"^ 1 " vs fc* n for the five 
decades 1950, 1960, 1970, 1980, 1990. The colored trends represent: the DCM 
(blue), the DDCM (green), the RCM (orange) and the GRM (violet). 



V. Results and Discussion 
A. Degree-degree correlations 

The degree-degree correlations are analysed by comparing 
the observed values of fc™^ m and f t ° ut / out with their expected 
values under the four, considered null models. The results 



are shown in fig. [2] As a general consideration, the four null 
models qualitatively perform in approximately the same way, 
as the superposition of the four colored trends shows. 

However, by carefully looking at the quantitative differences 
we find out that the expected trends under the DCM are 
improved by all the remaining three models which preserve 
the DCM statistics but also add one, or more, contraints. 

In fact, they become less smooth, by following the irregu- 
larities of the observed, scattered points. In this respect, the 
RCM seems to perform best, by showing the larger deviations 
from the DCM trend: on the other side, the DDCM and the 
GRM, by both adding one parameter to the DCM constraints, 
are very similar to each other. 

From a temporal perspective, the 1950 and the 1960 (first 
and second row in fig. [2} are the sparsest network with the 
most scattered trends, showing the least agreement between 
the observations and the expectations. 

B. Reciprocity and filling 

By the definition of p and cf>, it is clear that both \p\ < 1 and 
\<p\ < 1. In fact, the denominator is the normalization constant 
not contributing to the sign of the quantity itself. So the sign 
of p and </> is decided only by the relative magnitude between 
the observed values, r and /, and their expectation: a positive 
sign indicates a stronger than expected tendency to reciprocate 
or to fill the embedding space. On the other hand, a negative 
sign indicates a weaker than expected tendency to reciprocate 
or to fill the embedding space. 

Let us consider the reciprocity. From fig. [3] and table [I] is 
clear that the DCM and the DDCM perform almost the same 
in exaplining the observed reciprocity: the positive sign of 
p indicates the tendency of the network to reciprocate more 
than expected (both under the DCM and the DDCM) but the 
addition of the information about the geographic distances 
improves the agreement between the observed r and the 
expected r, signalled by a lower value of the corresponding 
p. Note that both the RCM and the GRM incorporate the 
information encoded into the reciprocity: so, by definition, 

PRCM = PGRM — 0. 

Now, let us consider the filling. In this case, the DCM, 
the GRM and the RCM perform the same, indicating the 
tendency of the network to fill the embedding space less than 
expected (in fact, the sign is negative). However, in this case 
the addition of the infomation about the (global or local) 
reciprocity structure does not seem to add anything more to 
what predicted by the directed degree sequences alone. In this 
case, the model incorporating the information carried by the 
filling is the DDCM, for which 4>ddcm = by definition. 

C. AIC and BIC criteria 

The previous subsections have shown semi-quantitative at- 
tempts to test the effectiveness of the four null models in 
explaining the observed patterns. The above results appear 
to qualitatively confirm that the strongest factor shaping the 
WTW topology is the degree sequence lfl8l . but small im- 
provements can be made either by adding spatial factors [11] 




Fig. 2. Observed (red) and expected (colored) fc°"'/ out vs fc° ut f or the 
five decades 1950, 1960, 1970, 1980, 1990. The colored trends represent: the 
DCM (blue), the DDCM (green), the RCM (orange) and the GRM (violet). 



or reciprocity effects 0. In order to quantify how "small" the 
improvements are, and whether are statistically significant, we 
use the criteria introduced in sections III B. and III D. Table 
[TTT1 shows the AIC and the BIC values for the null models 
considered so far. Apart from 1950, AIC favours the RCM for 
all the years, the model with the highest number of parameters 



1971) 
Year 



1970 
Year 



Fig. 3. Observed reciprocity, r (red), poCM (blue) and Podcm (green). 
By definition, pncu = PGRM = 0. 

TABLE I 

r AND p FOR THE FOUR NULL MODELS. 



Year 


r 


Pdcm 


Pddom 


Prcm 


Pgrm 


1950 


0.83 


0.43 


0.39 








1960 


0.84 


0.50 


0.48 








1970 


0.85 


0.53 


0.50 








1980 


0.86 


0.52 


0.49 








1990 


0.89 


0.54 


0.50 









which specifies the local reciprocity structure of the observed 
network. This is compatible both with the trends showed in 
the scatter plots and with the value of Prcm which, being 
zero, provides the best prediction for the expected value of r. 
For the filling prediction, the RCM and the GRM perform the 
same. 

BIC, on the other side, always favours the GRM, the model 
which adds to the DCM only the global information about the 
reciprocity. This is compatible both with the value of pgrm 
and with the filling prediction: we already commented the 
small difference in the predicted values of the degree-degree 
correlations under the RCM and the GRM. 

So, by looking only at the AIC and BIC values we are 
left with two possible models to choose between: RCM and 



GRM. Let us calculate the weights, as shown in table IV Apart 
from 1950, AIC weights always favour the RCM, which is the 
model with the highest probability to be the most correct. For 
the year 1950, the RCM and the GRM compete and should 
be both retained El . l23l . On the other side, BIC weights 
always favour the GRM which seems to be accurate enough 
to give the best prediction. 

With respect to the DCM, the DDCM (i.e. the DCM with 
the addition of the geographic distances) is actually better, as 
signalled by a lower value of AIC and BIC. Since the degree 
sequences are known to be positively correlated to the world 
countries GDPs, this means that by considering the distances 
in addition to the GDPs improves the prediction of the model. 

What about the GRM? As the DDCM, also the GRM adds 
only one parameter to the DCM: in fact, they have the same 
number of parameters, i.e. 2N + 1. However, in the DDCM it 
is sufficient to introduce the parameter z to consider the whole 
matrix of geographic distances that, in turn, affect every single 
probability connection pij. On the contrary, in the GRM the 
parameter z only introduces one quantity, , and remains the 



Fig. 4. Observed filling, / (red), 4>nCM (blue), 4>rcm (orange), <p G RM 
(purple). By definition, Podcm = 0. 

TABLE II 

/ AND (f> FOR THE FOUR NULL MODELS. 



Year 


f 


Y*DCM 


(*DDCM 




^GRM 


1950 


0.40 


-0.13 





-0.13 


— 0.13 


1960 


0.40 


-0.11 





-0.11 


— 0.11 


1970 


0.39 


-0.15 





-0.15 


-0.15 


1980 


0.42 


-0.16 





-0.16 


-0.16 


1990 


0.41 


-0.14 





-0.14 


-0.14 



TABLE III 

AIC AND BIC FOR THE CONSIDERED NULL MODELS (ROUNDED TO THE 
NEAREST INTEGER). 



Year 


AICqcm 


AICdqcm 


AIC R cm 


AIC GR m 


1950 


~ 5172 


~ 4796 


~ 4646 


~ 4645 


1960 


~ 9840 


~ 9360 


~ 8576 


~ 8593 


1970 


~ 16816 


~ 15818 


~ 14218 


~ 14406 


1980 


~ 20539 


~ 19135 


~ 17435 


~ 17680 


1990 


~ 20496 


~ 19170 


~ 17165 


~ 17492 


Year 


BIC D cM 


BIC DD cm 


BIC RC M 


BIC GR m 


1950 


~ 5594 


~ 5221 


~ 5280 


~ 5070 


1960 


~ 10471 


~ 9994 


~ 9523 


~ 9227 


1970 


~ 17640 


~ 16645 


~ 15454 


~ 15233 


1980 


~ 21523 


~ 20122 


~ 18911 


~ 18667 


1990 


~ 21537 


~ 20215 


~ 18727 


~ 18537 



same for every pair of nodes. On the basis of this apparent 
convenience, we could be tempted to choose the DDCM as 
the best between the two, but this is not supported by the two 
criteria which confirm the GRM as the best model between 
them (and, eventually, among all). This remains valid also for 
the year 1950, even if BIC indicates DDCM as a preferable 
model with respect to RCM: anyway, GRM outperforms both. 
This indicates that in order to have a good prediction of 
the WTW in 1950, the whole, local reciprocity structure is 
redundant: the global information about the distances could 
be a better choice. But the best choice is represented by the 
global information about the reciprocity structure. 

So, given the DCM constraints (the in-degree and out-degree 
sequences), the next best choice to impose an additive con- 
straint does not involve the distances between countries but the 
global reciprocity structure of the trade-exchanges network. 
In other words, given the GDPs of the world countries, a 
better choice than the common one would be the definition 
of a gravity model incorporating the information about the 
reciprocal trade-exchanges. 



VI. Conclusion 



table rv 

AIC WEIGHTS AND BIC WEIGHTS FOR THE CONSIDERED NULL MODELS. 



In this paper we have considered four null models to 
analyse five decades of the World Trade Web, represented as 
binary, directed networks. The adopted approach was different 
from that of the gravity-models (or the zero-inflated gravity 
models, created to manage binary networks), making use of 
the exponential random graph formalism. We have therefore 
rephrased the problem of distances by suitably defining struc- 
tural constraints in term of the adjacency matrix elements. 

Starting from the DCM Hamiltonian we have considered 
more, and different, topological quantities to test the effec- 
tiveness of the geographic distances in explaining the binary 
structure of the WTW and to compare them with the other 
types of chosen constraints. 

The geographic distances were introduced by means of a 
global index and added to the DCM, but we found only a 
slight improvement of the latter. In the same way, another 
global index was introduced to consider the global reciprocity 
structure of the network. What emerges from the statistical 
criteria used to indentify the best model is that, actually, 
two models compete and (unless considering the multimodel 
averaging inference alternative [22], lf23lD should both be 
retained: the RCM and the GRM (the first one already 
performed successfully in the motifs analysis of the WTW 

0) . In the same way, if we calculate the AIC and the BIC 
weights between the DDCM and the GRM (having the same 
number of parameters but exploiting different information: all 
the distances or only the total number of reciprocal links), the 
latter always performs better than the former (with probability 

1) . 

It should be noted that, in principle, geography effects might 
already be present in the degree sequence, so that controlling 
for the latter automatically controls (at least partially) for the 
distances. Therefore, the correct way to interpret our results 
should be that the role of geography, if present, is almost 
entirely encoded within the degree sequence, so the additional 
explicit inclusion of distance constraint does not improve the 
modeling significantly. However, we do not expect distances to 
be significantly encoded into the (reciprocated or not) degree 
sequence, for various reasons. First of all, the degrees of 
countries are known to be depend strongly on the GDP |8|. The 
latter varies over many orders of magnitude, while distances 
vary only within a narrow range. Secondly, degrees are local 
(vertex-specific) properties, whereas distances are pairwise 
(edge-specific) properties. By preserving only the degrees, 
the DCM breaks the possible original associations between 
connectivity and distances, but still reproduces the WTW well. 
Finally, as clear from 
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the inclusion of distances in the 
DDCM does not introduce a probabilistic dependence between 
a link and its reciprocal one, a dependence which is instead 
produced by the inclusion of reciprocity in the RCM and 
GRM. Therefore, we do not expect distance effects to be 
encoded in the reciprocated degree sequences as well, because 
the strong reciprocity of the latter could not be explained by 
the DDCM, and not even by gravity models. 



Year 


w A1<J 
w n™ 


W A1(J 

w r>r>r.M 


w A1<J 


W ORM 


1950 








0.36 


0.64 


1960 








1 





1970 








1 





1980 








1 





1990 








1 





Year 


w r>OM 


w r>r>r.M 




w B1<J 
W ORM 


1950 











1 


1960 











1 


1970 











1 


1980 











1 


1990 











1 



Our result conclusively show that although spatial effects 
are indeed present in the WTW topology they are entirely 
dominated by the non-spatial effects determined by the reci- 
procity. This suggests to prefer the reciprocity structure of 
the network (local, with the RCM, or global, with the GRM) 
to the geographic distances, in order to obtain more precise 
predictions about the trade-exchanges. This, in turn, implies 
that the information coming from the GDPs, in a gravity model 
framework, should be sustained by some other economic 
indicator about the reciprocal trade activity of the involved 
countries. 
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